Sentence-level MT evaluation
نویسندگان
چکیده
In this paper we investigate the possibility of evaluating MT quality and fluency at the sentence level in the absence of reference translations. We measure the correlation between automatically-generated scores and human judgments, and we evaluate the performance of our system when used as a classifier for identifying highly dysfluent and illformed sentences. We show that we can substantially improve on the correlation between language model perplexity scores and human judgment by combining these perplexity scores with class probabilities from a machine-learned classifier. The classifier uses linguistic features and has been trained to distinguish human translations from machine translations. We show that this approach also performs well in identifying dysfluent sentences.
منابع مشابه
The Back-translation Score: Automatic MT Evaluation at the Sentence Level without Reference Translations
Automatic tools for machine translation (MT) evaluation such as BLEU are well established, but have the drawbacks that they do not perform well at the sentence level and that they presuppose manually translated reference texts. Assuming that the MT system to be evaluated can deal with both directions of a language pair, in this research we suggest to conduct automatic MT evaluation by determini...
متن کاملThe Backtranslation Score: Automatic MT Evalution at the Sentence Level without Reference Translations
Automatic tools for machine translation (MT) evaluation such as BLEU are well established, but have the drawbacks that they do not perform well at the sentence level and that they presuppose manually translated reference texts. Assuming that the MT system to be evaluated can deal with both directions of a language pair, in this research we suggest to conduct automatic MT evaluation by determini...
متن کاملUsing Machine Translation Evaluation Techniques to Determine Sentence-level Semantic Equivalence
The task of machine translation (MT) evaluation is closely related to the task of sentence-level semantic equivalence classification. This paper investigates the utility of applying standard MT evaluation methods (BLEU, NIST, WER and PER) to building classifiers to predict semantic equivalence and entailment. We also introduce a novel classification method based on PER which leverages part of s...
متن کاملRegression for Sentence-Level MT Evaluation with Pseudo References
Many automatic evaluation metrics for machine translation (MT) rely on making comparisons to human translations, a resource that may not always be available. We present a method for developing sentence-level MT evaluation metrics that do not directly rely on human reference translations. Our metrics are developed using regression learning and are based on a set of weaker indicators of fluency a...
متن کاملNormalized Compression Distance as automatic MT evaluation metric
This paper evaluates a new automatic MT evaluation metric, Normalized Compression Distance (NCD), which is a general tool for measuring similarities between binary strings. We provide system-level correlations and sentence-level consistencies to human judgements and comparison to other automatic measures with the WMT’08 dataset. The results show that the general NCD metric is at the same level ...
متن کاملTowards a Predicate-Argument Evaluation for MT
HMEANT (Lo and Wu, 2011a) is a manual MT evaluation technique that focuses on predicate-argument structure of the sentence. We relate HMEANT to an established linguistic theory, highlighting the possibilities of reusing existing knowledge and resources for interpreting and automating HMEANT. We apply HMEANT to a new language, Czech in particular, by evaluating a set of Englishto-Czech MT system...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2005